70 research outputs found

    Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

    Get PDF
    Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331

    Static analysis-based approaches for secure software development

    Get PDF
    Software security is a matter of major concern for software development enterprises that wish to deliver highly secure software products to their customers. Static analysis is considered one of the most effective mechanisms for adding security to software products. The multitude of static analysis tools that are available provide a large number of raw results that may contain security-relevant information, which may be useful for the production of secure software. Several mechanisms that can facilitate the production of both secure and reliable software applications have been proposed over the years. In this paper, two such mechanisms, particularly the vulnerability prediction models (VPMs) and the optimum checkpoint recommendation (OCR) mechanisms, are theoretically examined, while their potential improvement by using static analysis is also investigated. In particular, we review the most significant contributions regarding these mechanisms, identify their most important open issues, and propose directions for future research, emphasizing on the potential adoption of static analysis for addressing the identified open issues. Hence, this paper can act as a reference for researchers that wish to contribute in these subfields, in order to gain solid understanding of the existing solutions and their open issues that require further research

    An Efficient Technique for Tracking Nondeterministic Execution and its Applications

    No full text
    This report describes a technique for using instruction counters to track nondeterminism in the execution of operating system kernels and user programs. The operating system records the number of instructions between consecutive nondeterministic events and information about their nature during normal operation. During an analysis phase, the execution is repeated under the control of a monitor, and the nondeterministic events are applied at the same instructions as during the monitored execution. We describe the application of this technique to four areas: Performance monitoring: The technique can be used to instrument an operating system to capture long traces of memory references. Unlike current techniques, it performs the gathering in a postmortem phase and therefore has negligible effect on the computation itself during the monitoring phase. We expect trace periods that are longer than what existing techniques can capture by orders of magnitude with little or no noticeable perturba..

    Storage Strategies for Fault-Tolerant Video Servers

    No full text
    We consider the problem of providing high availability in cluster-based video servers. The cluster acts as a parallel processor that provides the aggregate I/O and network bandwidths of the component machines. In such an environment, the failure of one server may affect the availability of the video service or its quality. Existing approaches to this problem fall into two categories. On one hand there are RAID-like schemes that store error correcting code (ECC) in addition to the video data. Should a failure occur, the unavailable data can be computed on the fly using the ECC and the service continues at the same quality. In cluster-based systems, however, the video data are distributed over several servers and there is no convenient point to reconstruct the missing blocks except at the client. Relying on the client for this task is not desirable as it may not have the necessary buffering or processing capacity On the other hand, some argue that the system could just continue operation..

    On the Relevance of Communication Costs of Rollback-Recovery Protocols

    No full text
    Communication overhead has been traditionally the primary metric for evaluating rollback-recovery protocols. This paper reexamines the prominence of this metric in light of the recent increases in processor and network speeds. We introduce a new recovery algorithm for a family of rollbackrecovery protocols based on logging. The new algorithm incurs a higher communication overhead during recovery than previous algorithms, but it requires less access to stable storage and imposes no restrictions on the execution of live processes. Experimental results show that the new algorithm performs better than one that is optimized for low communication overhead. These results suggest that in modern environments, latency in accessing stable storage and intrusion of a particular algorithm on the execution of live processes are more important than the number of messages exchanged during recovery

    Address trace compression through loop detection and reduction

    No full text

    Low-Cost Garbage Collection for Causal Message Logging

    No full text
    • …
    corecore